Skip to content

Fix: parse Accept header q-values in wpsc_get_accept_header()#1057

Open
thisismyurl wants to merge 3 commits into
Automattic:trunkfrom
thisismyurl:fix/accept-header-q-value-parsing
Open

Fix: parse Accept header q-values in wpsc_get_accept_header()#1057
thisismyurl wants to merge 3 commits into
Automattic:trunkfrom
thisismyurl:fix/accept-header-q-value-parsing

Conversation

@thisismyurl

Copy link
Copy Markdown

Summary

Fixes #1045wpsc_get_accept_header() was using a naive str_contains() check that matched any JSON media type anywhere in the Accept string, ignoring RFC 7231 §5.3.2 quality values. The result: monitoring tools like New Relic Synthetics (and Datadog/Pingdom/UptimeRobot) that send valid HTML-preferring headers such as:

Accept: text/html,application/xhtml+xml,application/json;q=0.9,...

were misclassified as JSON requests. Two downstream effects:

  1. wp_cache_serve_cache_file() refused to serve the cached file on every synthetic check.
  2. A separate application/json cache bucket was populated on each check, triggering a fresh page build every time.

Changes

wp-cache-phase2.php

  • Extracted a new wpsc_parse_accept_header( $raw_accept, $json_types ) helper. It parses q-values per RFC 7231 §5.3.2 and classifies the request as application/json only when a known JSON type has a strictly higher q-value than text/html. Ties resolve to text/html (safe default: serve the cached page).
  • wpsc_get_accept_header() now delegates to this helper after the empty-header early-return. The static memoization and wpsc_accept_headers filter contract are unchanged. Callers are unaffected.

Edge cases handled:

  • Malformed q-values (non-numeric, out-of-range) — treated as q=1.0, no warning or fatal.
  • */* wildcard — provides effective html q-value only when text/html is not explicitly present.
  • Whitespace around media-range tokens.

tests/php/WpscParseAcceptHeaderTest.php (new file)

14 unit tests covering the seven acceptance criteria from the issue brief plus additional edge cases (wildcard behaviour, custom types via the filter, typical browser headers, out-of-range q-values). The test exercises wpsc_parse_accept_header() directly — it has no WordPress dependencies, so no WP test scaffolding is required.

PHPUnit 12.5.28 — 14 / 14 (100%) — 0.003s

Note on test harness: the issue brief mentions a harness being stood up in #1051. I've added the test class here so the coverage lands with the fix rather than waiting. If #1051 introduces a shared bootstrap or test base class, this file is easy to migrate.

Acceptance criteria from the issue brief

  • New Relic Synthetics header (text/html implicit q=1.0, application/json;q=0.9) → text/html
  • Accept: application/json alone → application/json
  • Accept: text/html,application/json (tie, both q=1.0) → text/html
  • Accept: application/json,text/html;q=0.9 (json strictly higher) → application/json
  • Accept: application/ld+json;q=1.0,text/html;q=0.8 (extended type via filter) → application/json
  • Absent or empty Accept → text/html
  • Malformed q-values → no warning or fatal, deterministic result
  • wpsc_accept_headers filter types participate in q-value comparison
  • Function remains free of late-loaded dependencies; static memoization preserved

The previous implementation used a naive str_contains() check: if any
known JSON media type appeared anywhere in the Accept header string, the
request was classified as application/json. This ignored RFC 7231 §5.3.2
quality values (q=), so a valid HTML-preferring header such as the one
sent by New Relic Synthetics —

  Accept: text/html,application/xhtml+xml,application/json;q=0.9,...

— was misclassified as a JSON request. Two downstream effects:

1. wp_cache_serve_cache_file() refused to serve the cached file.
2. A separate application/json cache bucket was populated on every
   synthetic check, triggering a fresh page build each time.

This commit extracts a new wpsc_parse_accept_header() helper that
parses q-values and classifies the request as application/json only
when a known JSON type has a strictly higher q-value than text/html.
Ties resolve to text/html (safe default: serve the cached page).
The wpsc_accept_headers filter continues to work — filtered types
participate in the q-value comparison as JSON types.

Fixes Automattic#1045
Copilot AI review requested due to automatic review settings May 29, 2026 18:36

@donnchawp donnchawp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by AI.

Thanks for this — the direction is right and the execution is clean: extracting a testable wpsc_parse_accept_header() helper, real RFC 7231 q-value parsing, preserved memoization and the wpsc_accept_headers filter contract, malformed/out-of-range handling, and no-DB unit tests that fit nicely into the smoke-test tier being set up in #1051. The reported New Relic Synthetics case is fixed correctly.

There's one required change before merge: the */* wildcard fallback re-introduces a deliberately-fixed Fediverse regression.

The problem

This PR was written against an earlier version of the issue brief that still credited text/html with the */* quality value. That brief has since been updated to remove it, precisely because of this case. The current code:

} elseif ( null !== $wildcard_q ) {
    $effective_html_q = $wildcard_q;
}

Trace a Fediverse fetch — Accept: application/activity+json, */*:

  • html_q = null (no explicit text/html)
  • wildcard_q = 1.0 → effective_html_q = 1.0
  • json_q = 1.0
  • 1.0 > 1.0 is false → returns text/html

The same happens for any JSON client sending Accept: application/json, */*. Because the cache only ever serves text/html (wp_cache_serve_cache_file() bails on anything else), classifying these as text/html means the client passes the serve-gate and gets handed the cached HTML page instead of being bypassed to the application — the exact problem the Accept-header handling was added to prevent. The asymmetry matters here: misclassifying an HTML-preferring request as JSON only costs a cache miss; misclassifying a JSON-preferring request as HTML serves the wrong representation.

Suggested fix

Use the explicit text/html q-value only, defaulting to 0.0 when it's absent — don't fall back to */*:

// Effective html q: explicit text/html only. A */* wildcard must not lift
// text/html above an explicitly requested JSON type, or JSON/Fediverse
// clients sending e.g. "application/json, */*" get served cached HTML.
$effective_html_q = ( null !== $html_q ) ? $html_q : 0.0;

…and drop the $wildcard_q tracking. This still fixes the New Relic case (explicit text/html;q=1.0 > application/json;q=0.9), and Accept: */* on its own still resolves to text/html because json_q is 0.0 when no JSON type is present.

Test changes

  • test_wildcard_covers_html_when_not_explicit asserts the behaviour we need to remove — flip it: */*,application/json;q=0.9 should classify as application/json.
  • Add a regression test: application/json, */*application/json.
  • test_explicit_html_takes_precedence_over_wildcard and a */*-alone → text/html case both still pass unchanged.

Everything else looks good to go once the wildcard handling is removed. For full background on why JSON/Fediverse traffic is kept out of the cache entirely, see the discussion on #1045.

JSON/Fediverse clients commonly send Accept: application/json, */*. The
*/* wildcard was lifting effective_html_q to 1.0 when text/html was absent,
causing ties that resolved to text/html — serving cached HTML to clients
that explicitly requested JSON.

Fix: ignore */* entirely when computing effective_html_q. Use only the
explicit text/html q-value, defaulting to 0.0 when absent. This preserves
the New Relic Synthetics case (explicit text/html;q=1.0 beats json;q=0.9)
while correctly routing application/json, */* to the application layer.

Test changes:
- Rename and flip test_wildcard_covers_html_when_not_explicit to
  test_wildcard_does_not_cover_html_when_not_explicit (asserts JSON)
- Add test_fediverse_json_with_wildcard_classifies_as_json as a
  regression guard for the specific Fediverse case from the review

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thisismyurl

Copy link
Copy Markdown
Author

Good catch — the wildcard fallback was definitely wrong for this case. The asymmetry you described is exactly right: misclassifying a JSON client as HTML silently serves the wrong representation, while the reverse only costs a cache miss.

Pushed a follow-up that:

  • Drops $wildcard_q tracking entirely
  • Sets $effective_html_q = ( null !== $html_q ) ? $html_q : 0.0; — explicit text/html only, no wildcard fallback
  • Updates the docblock to document the intentional omission
  • Flips test_wildcard_covers_html_when_not_explicittest_wildcard_does_not_cover_html_when_not_explicit (asserts application/json)
  • Adds test_fediverse_json_with_wildcard_classifies_as_json as the regression guard for application/json, */*

The New Relic Synthetics case still works: explicit text/html;q=1.0 in the header gives effective_html_q = 1.0, which beats json;q=0.9 as before. Accept: */* alone still resolves to text/html because both html_q and json_q are 0.0 and the > comparison is false.

@donnchawp donnchawp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by AI.

The classification logic is now correct — thanks for turning it around. effective_html_q uses the explicit text/html q-value only (defaulting to 0.0), the */* fallback is gone, and test_fediverse_json_with_wildcard_classifies_as_json (application/json, */*application/json) plus the flipped wildcard test lock in the Fediverse behaviour. No objections to the production change in wp-cache-phase2.php.

The two red checks are both test-file hygiene, not logic — all 15 assertions actually pass.

PHPUnit (PHP 8.2) — doc-comment metadata deprecation

The class-level /** @covers wpsc_parse_accept_header */ doc-comment triggers a PHPUnit deprecation, and both phpunit.11.xml.dist and phpunit.12.xml.dist set failOnPhpunitDeprecation="true". The "only 8.2 fails" puzzle is a PHPUnit-version split: on 8.2 composer downgrades to PHPUnit 11, which still reads doc-comment metadata and emits the deprecation → exit 1; on 8.3+ it uses PHPUnit 12, which ignores doc-comment metadata entirely, so no deprecation fires. The run log confirms it: OK, but there were issues! Tests: 15, Assertions: 15, PHPUnit Deprecations: 1.

Fix: replace the @covers doc-comment with the attribute, which works on both 11 and 12 and matches the repo's existing Jetpack.PHPUnit.Attributes sniff:

use PHPUnit\Framework\Attributes\CoversFunction;

#[CoversFunction( 'wpsc_parse_accept_header' )]
class WpscParseAcceptHeaderTest extends TestCase {

PHPCS — 7 errors, all in tests/php/WpscParseAcceptHeaderTest.php

  1. "A file should either contain function declarations or OO structure declarations, but not both." The apply_filters() stub function collides with the test class. wpsc_parse_accept_header() (the function under test) doesn't call apply_filters, so the stub is probably unnecessary; if requiring wp-cache-phase2.php turns out to need it, move it into tests/php/bootstrap.php. Either way the test file should contain only the class.
  2. Missing @var tag on the $json_types property docblock.
  3. The four // method comments (the wildcard-behaviour notes) need to be /** ... */ docblocks.
  4. One doc-comment short description starts lowercase ("application/activity+json classified…") — capitalize it.

Minor

The comment in test_malformed_q_value_treated_as_default still says "effective html_q from */*=0.7" — leftover from the old wildcard logic. The assertion is correct, but the reasoning should read "html absent → effective_html_q = 0.0; json q=1.0 > 0.0 → application/json."

Once the @covers becomes an attribute and the PHPCS items are cleared, this should go green. Logic LGTM.

@donnchawp

Copy link
Copy Markdown
Contributor

Thanks @thisismyurl for the PR. Almost there. :)

PHPUnit (PHP 8.2): replace @Covers doc-comment with #[CoversFunction]
attribute. phpunit.11.xml.dist sets failOnPhpunitDeprecation=true and
PHPUnit 11 (used on 8.2) still reads doc-comment metadata and emits a
deprecation; PHPUnit 12 (8.3+) ignores it. The attribute works on both.

PHPCS: the apply_filters() stub triggered "file should contain either
function declarations or OO structure declarations, but not both". Moved
the stub to tests/php/bootstrap.php (where it belongs — the stub is a
test infrastructure concern, not part of the test class) and fixed the
bootstrap @Package tag while there.

Remaining PHPCS fixes in WpscParseAcceptHeaderTest.php:
- $json_types property: add @var string[] to the docblock
- Four wildcard-behaviour method comments: // → /** */
- Lowercase short description: "application/activity+json classified"
  → "Classifies application/activity+json as JSON"
- Stale inline comment in test_malformed_q_value_treated_as_default:
  removed the "effective html_q from */*=0.7" reasoning (wildcard no
  longer applies); updated to "text/html absent → effective_html_q = 0.0"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thisismyurl

Copy link
Copy Markdown
Author

Pushed a follow-up addressing both CI failures.

PHPUnit deprecation (PHP 8.2) — replaced /** @covers wpsc_parse_accept_header */ with #[CoversFunction( 'wpsc_parse_accept_header' )] + use PHPUnit\Framework\Attributes\CoversFunction. The attribute works on PHPUnit 11 and 12; the doc-comment form only triggered the deprecation on 11 (the PHP 8.2 path).

PHPCS — 7 errors — all in the test file:

  • Moved the apply_filters() stub to tests/php/bootstrap.php (where it belongs as test infrastructure); fixed the bootstrap @package tag while there.
  • Added @var string[] to the $json_types property docblock.
  • Converted the four // wildcard-behaviour method comments to /** ... */ docblocks.
  • Updated the lowercase short description: "application/activity+json classified as JSON" → "Classifies application/activity+json as JSON".
  • Fixed the stale inline comment in test_malformed_q_value_treated_as_default — removed the */*=0.7 reasoning and updated to effective_html_q = 0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wpsc_get_accept_header() ignores Accept header quality values (q-values)

3 participants